efficient generation
- North America > Canada (0.05)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)
Efficient Generation of Binary Magic Squares
We propose a simple algorithm for generating Binary Magic Squares (BMS), i.e., square binary matrices where the sum of all rows and all columns are equal. We show by induction that our algorithm always returns valid BMS with optimal theoretical complexity. We then extend our study to non-square Binary Magic Squares, formalize conditions on the sum of rows and columns for these BMS to exist, and show that a slight variant of our first algorithm can generate provably generate them. Finally, we publicly release two implementations of our algorithm as Python packages, including one that can generate several BMS in parallel using GPU acceleration.
- North America > Canada (0.05)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.05)
SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs
Jie, Shibo, Tang, Yehui, Han, Kai, Deng, Zhi-Hong, Han, Jing
Transformer-based large language models (LLMs) have already achieved remarkable results on long-text tasks, but the limited GPU memory (VRAM) resources struggle to accommodate the linearly growing demand for key-value (KV) cache as the sequence length increases, which has become a bottleneck for the application of LLMs on long sequences. Existing KV cache compression methods include eviction, merging, or quantization of the KV cache to reduce its size. However, compression results in irreversible information forgetting, potentially affecting the accuracy of subsequent decoding. In this paper, we propose SpeCache, which takes full advantage of the large and easily expandable CPU memory to offload the complete KV cache, and dynamically fetches KV pairs back in each decoding step based on their importance measured by low-bit KV cache copy in VRAM. To avoid inference latency caused by CPU-GPU communication, SpeCache speculatively predicts the KV pairs that the next token might attend to, allowing us to prefetch them before the next decoding step which enables parallelization of prefetching and computation. Experiments on LongBench and Needle-in-a-Haystack benchmarks verify that SpeCache effectively reduces VRAM usage while avoiding information forgetting for long sequences without re-training, even with a 10x high KV cache compression ratio.
Review for NeurIPS paper: Efficient Generation of Structured Objects with Constrained Adversarial Networks
Weaknesses: - The method section looks not self-contained and lacks descriptions of some key components. In particular: * What is Eq.(9) for? Why "the SL is the negative logarithm of a polynomial in \theta" -- where is the "negative logarithm" in Eq.(9)? It looks its practical implementation is discussed in the "Evaluating the Semantic Loss" part (L.140) which involves the Weighted Model Count (WMC) and knowledge compilation (KC). However, no details about KC are presented.
Review for NeurIPS paper: Efficient Generation of Structured Objects with Constrained Adversarial Networks
This work aims at estimating generative distributions of structured objects that satisfy certain semantic constraints (in first-order logic). The authors achieve this goal by adding a "semantic loss" to the GAN's learning objective and using Knowledge compilation (KC) to build a circuit that allows efficient evaluation. Experiments on game-level generation tasks and a molecule generation task support the proposed method. Strengths: i) Incorporating structured constraints in GAN models is both intellectually and practically interesting; ii) The experiments are comprehensive and convincing in most cases; and iii) the paper is clearly written for most parts. The paper is recommended for acceptance.
Efficient Generation of Structured Objects with Constrained Adversarial Networks
Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and game maps. The issue is that structured objects must satisfy hard requirements (e.g., molecules must be chemically valid) that are difficult to acquire from examples alone. As a remedy, we propose Constrained Adversarial Networks (CANs), an extension of GANs in which the constraints are embedded into the model during training. This is achieved by penalizing the generator proportionally to the mass it allocates to invalid structures. In contrast to other generative models, CANs support efficient inference of valid structures (with high probability) and allows to turn on and off the learned constraints at inference time.
Efficient Generation of Molecular Clusters with Dual-Scale Equivariant Flow Matching
Subramanian, Akshay, Qu, Shuhui, Park, Cheol Woo, Liu, Sulin, Lee, Janghwan, Gómez-Bombarelli, Rafael
Amorphous molecular solids offer a promising alternative to inorganic semiconductors, owing to their mechanical flexibility and solution processability. The packing structure of these materials plays a crucial role in determining their electronic and transport properties, which are key to enhancing the efficiency of devices like organic solar cells (OSCs). However, obtaining these optoelectronic properties computationally requires molecular dynamics (MD) simulations to generate a conformational ensemble, a process that can be computationally expensive due to the large system sizes involved. Recent advances have focused on using generative models, particularly flow-based models as Boltzmann generators, to improve the efficiency of MD sampling. In this work, we developed a dual-scale flow matching method that separates training and inference into coarse-grained and all-atom stages and enhances both the accuracy and efficiency of standard flow matching samplers. We demonstrate the effectiveness of this method on a dataset of Y6 molecular clusters obtained through MD simulations, and we benchmark its efficiency and accuracy against single-scale flow matching methods.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Energy > Renewable > Solar (0.35)
ToDo: Token Downsampling for Efficient Generation of High-Resolution Images
Smith, Ethan, Saxena, Nayan, Saha, Aninda
Attention mechanism has been crucial for image diffusion models, however, their quadratic computational complexity limits the sizes of images we can process within reasonable time and memory constraints. This paper investigates the importance of dense attention in generative image models, which often contain redundant features, making them suitable for sparser attention mechanisms. We propose a novel training-free method ToDo that relies on token downsampling of key and value tokens to accelerate Stable Diffusion inference by up to 2x for common sizes and up to 4.5x or more for high resolutions like 2048x2048. We demonstrate that our approach outperforms previous methods in balancing efficient throughput and fidelity.
Planning Graphs for Efficient Generation of Desirable Narrative Trajectories
Amos-Binks, Adam (North Carolina State University) | Potts, Colin (North Carolina State University) | Young, R. Michael (University of Utah)
A goal of Experience Managers (EM) is to guide users through a space of narrative trajectories, or story branches, in an Interactive Narrative (IN). When a user performs an action that deviates from the intended trajectory, the EM uses a mediation strategy called accommodation to transition the user to a new desirable trajectory. However, generating the trajectory options then selecting the appropriate one is computationally expensive and at odds with the low-latency needs of an IN. We define three desirable properties (exemplar trajectories, narrative-theoretic comparison, and efficiency) that general solutions would possess and demonstrate how our plan-based Intention Dependency Graph addresses them.